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. In many practical cases, a sensitivity analysis or an optimization of a complex time consum- 

ing computer code requires to build a fast running approximation of it - also called surrogate 
^ | ■ model. We consider in this paper the problem of building a surrogate model of a complex 

£^ . computer code which can be run at different levels of accuracy. The co-kriging based surrogate 

model is a promising tool to build such an approximation. The idea is to improve the surrogate 
model by using fast and less accurate versions of the code. We present here a new approach 
to perform a multi-fidelity co-kriging model which is based on a recursive formulation. The 
strength of this new method is that the co-kriging model is built through a series of indepen- 
dent kriging models. From them, some properties of classical kriging models can naturally be 
extended to the presented co-kriging model such as a fast cross-validation procedure. More- 
^vD . over, based on a Bayes linear formulation, an extension of the universal kriging equations are 

provided for the co-kriging model. Finally, the proposed model has the advantage to reduce 
■ the computational complexity compared to the previous models. The multi-fidelity model is 

successfully applied to emulate a hydrodynamic simulator. This real example illustrates the 
efficiency of the recursive model. 

Keywords, surrogate models, universal co-kriging, recursive model, fast cross-validation, 
multi-fidelity computer code. 
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1 Introduction 

Computer codes are widely used in science to describe physical phenomena. Advances in 
physics and computer science lead to increased complexity for the simulators. Therefore, it 
is common for the physicist to have different versions of a code which have different levels of 
accuracy and cost. Usually, to design and analyze a complex computer code, a fast approxi- 
mation of it - also call surrogate model - is built in order to avoid prohibitive computational 
cost. 

A very popular method to build surrogate model is the Gaussian process regression, also 
named kriging model. It is a particular class of surrogate models which makes the assumption 
that the response of the complex code is a realization of a Gaussian process. This method 
was originally introduced in used in geostatistics by |Krige, 1951 and |Matheron, 1963 and 



it was then proposed in the field of computer experiments by |Sacks et al., 1989 . During the 
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last decades, this method has become widely used and investigated. The reader is referred 
to the books of |Stein, 1999 , |Santner et al., 2003 and |Rasmussen and Williams, 20 06] for 
more detail about it. 

A question of interest is how to build a predictive model using data from experiments 
of multiple levels of fidelity. Indeed, complex computer codes can be extremely expensive 
and sometimes we cannot have, under reasonable time constraint, enough simulations to 
sample the input parameter space with enough density and range. In this case, it could 
be worth using fast versions of the code (which can be old or coarse versions of it) to im- 
prove its approximation. Our objective is hence to build a multi-fidelity surrogate model 
which is able to use the information obtained from the fast versions of the code. Such mod- 
els have been presented in the literature | Craig et al., 199"8] , |Kennedy and O'Hagan, 2000| , 



|Forrester et al., 2007] , |Qian and Wu, 2008| and |Cumming and Goldstein, 2009 



The first multi-fidelity model proposed in |Craig et al., 199"8] is based on a linear regres- 



sion formulation. Then |Cumming and Goldstein, 2009 have improved this model by using 



a Bayes linear formulation. The reader is referred to |Goldstein and Wooff, 2007 for further 
details about the Bayes linear approach. The methods suggested by |Craig et al., 1998] and 
|Cumming and Goldstein, 2009| have the strength to be relatively not computationally ex- 
pensive but as it is based on a linear regression formulation, it could suffer from a lack of 
accuracy. Another approach is to use an extension of kriging for multiple response models 
which is called co-kriging. The idea was implemented by |Kennedy and O'Hagan, 2000| who 
present a co-kriging model based on an autoregressive relation between the different code 
levels. This method has become very popular and many authors have developed it. In partic- 
ular, |Forrester et al., 2007| presents the use of co-kriging for multi-fidelity optimization and 
|Qian and Wu, 2008| proposed a Bayesian formulation of it. 

The strength of the co-kriging model is that it gives very good predictive models but 
it is often computationally expensive, especially when the number of simulations is large. 
Furthermore, large data set can generate problems such as ill-conditioned covariance matrices. 
It is even more difficult to deal with these problems for co-kriging since the total number of 
observations is the sum of the observations at all code levels. 

In this paper, we adopt a new approach for multi-fidelity surrogate modeling which uses 
a co-kriging model but with a recursive formulation. In fact, our model is able to build a s- 
level co-kriging model by building s independent krigings. This approach significantly reduces 
the complexity of the model since it divides the total number of observations on groups of 
observations corresponding to the ones of each level. Therefore, we will have s sub-matrices 
to invert which is less expensive than a big one and the estimation of the parameters can be 
performed separately. Finally, one of the main strengths of this approach is that it allows us to 
naturally extend classical results of kriging to the considered co-kriging model. In particular, 
we generalize and adapt the equations of the fast cross-validation proposed by |Dubrule, 1983| 
and we propose an universal co-kriging which is the natural extension of the well known 
universal kriging equations |Matheron, 1969 . 



2 Multi-fidelity Gaussian process regression. 

In a first subsection, we briefly present a first approach to build multi-fidelity model suggested 
by |Kennedy and O'Hagan, 2000| that uses a co-kriging model. In the next subsection, we de- 
tail our recursive approach to build a multi-fidelity recursive model. The recursive formulation 
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of the multi-fidelity model is the first novelty of this paper. We will see in the next sections 
that the new formulation allows us to find very important results about the co-kriging model 
and to reduce its computational complexity. Furthermore, we prove that the two models are 
equivalent. 



2.1 The classical autoregressive model. 

Let us suppose that we have s levels of code (^(x))t=i,..., s sorted by increasing order of fidelity 
and modeled by Gaussian processes (^t(x))t=i a , x £ Q. We hence consider that z s (x) is 
the most accurate and costly code that we want to surrogate and (zt(x))t=\ ... s _i are cheaper 
versions of it with z\(x) the less accurate one. We consider the following autoregressive model 
with t = 2, . . . , s: 

Z t (x) = p t -i(x)Z t -i{x) + 6 t (x) 

Z t -i{x) ± 6t(x) (1) 



Pt-i(x) = gt-xix)^ 



pt-i 



where: 
and: 



S t (x) ~ GV{ff{x)P t , °lr t {x, x')) (2) 



Zi{x) ~ gVUf(x)f3 u a 2 iri (x,x')) (3) 

Here, T stands for the transpose, _L denotes the orthogonality relationship, QV designs a 
Gaussian Process, gf_i(x) is a vector of qt-i regression functions, ff(x) is a vector of pt 
regression functions, rt(x,x') is a correlation function, j3t is a ^-dimensional vector, f3 pt _ 1 is a 
(ft_i-dimensional vector and a 2 is a real. Since we suppose that the responses are realizations 
of Gaussian processes, the multi-fidelity model can be built by conditioning by the known 
responses of the codes at the different levels. 

The previous model comes from the article of |Kennedy and O'Hagan, 2000 . It is induced 



by the following assumption: Vx E Q, if we know Zt-i(x), nothing more can be learned about 
Zf(x) from Zt-i(x') for x ^ x'. 

Let us consider = (Zf , . . . ,Zj) T the Gaussian vector containing the values of the 
random processes (Zt(x))t=\ t ... s at the points in the experimental design sets (Dt)t=i,..., s with 
D s ^ -Ds-i C ■ ■ ■ C D\ and = (zf , . . . , zJ) T a vector containing the values of (zt(x))t=i t ... tS 
at the points in [Dt)t=l,...,s- The nested property of the experimental design sets is not neces- 
sary to build the model but it allows for a simple estimation of the model parameters. Since the 
codes are sorted in increasing order of fidelity it is not an unreasonable constraint for practical 
applications. By denoting /3 = (0i , ■ ■ ■ ,ff) T the trend parameters, f3 p = (/3j l5 • • • ,f3j s _ 1 ) T 
the trend of the adjustment parameters and a 2 = (a 2 , . . . ,a 2 ) the variance parameters, we 
have: 

Vx G Q [Z s (x)\Z^ = z( s \f3,l3 p ,a 2 ]~N{m Zs (x),4 3 (x)) 

where: 

m Zs (x) = ti s {x) T f3 + t s {x) T Vr\zM - H 8 p) (4) 

and: 

slAx) = vUx) - t s (x) T V- l t s (x) (5) 
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The Gaussian process regression mean mz s (x) is the predictive model of the highest fidelity 
response z s (x) which is built with the known responses of all code levels z^ s \ The variance 
s 2 Za (x) represents the predictive mean squared error of the model. 

The matrix V s is the covariance matrix of the Gaussian vector Z^ s \ the vector t s (x) is the 
vector of covariance between Z s (x) and Z^ s \ H s f3 is the mean of Z^ s \ h' s (x) T '/3 is the mean of 
Z s (x) and (x) is the variance of Z s (x). All these terms are built in terms of the experience 
vector at level t © and to the covariance between Zt(x) and Z t /(x') ([7J) and ([8]). 




't-i 



cov(Z t (x),Z tl (x')\a 2 ,l3,p p ) = [ J[pi{x) ] cov(Z t ,(x), Z t ,(x')\a 2 , /3, /3 p ) (7) 

with : 



\i=t> 



t-i 



cov(Z t (x),Z t (x')\a 2 ,(3,(3 p ) =zZ a H U Pi( x M x ') | r i( x > x ') ( 8 ) 

j=i \i=j 

Remark. The model (Q]) is an extension of the model of |Kennedy and O'Hagan, 2000| in 
which the adjustment parameters pt(x)t=2,...,s do not depend on x. We show in a practical 
application that this extension is worthwhile. 

2.2 Recursive multi-fidelity model. 

In this section, we present the new multi-fidelity model which is based on a recursive formu- 
lation. Let us consider the following model for t = 2, . . . , s : 

Z t (x) = p t -i{x)Z t -i(x) + 5 t (x) 

Z t _i(x) J. 5 t (x) (9) 



Pt-xix) =g'f-i{x)P l 



pt-i 



where Zt—\{x) is a Gaussian process with distribution [Zt-i{x)\Z^~ 1 ^ = z^" 1 ), (it—\, P pt _ 2 , cr 2 _i\ 
and D s C D s _\ C • • • C D\. The unique difference with the previous model is that we express 
Zt(x) (the Gaussian process modeling the response at level t) as a function of the Gaussian 
process Zt-i(x) conditioned by the values z^* -1 ) = (z\, . . . , Zt-i) at points in the experimen- 
tal design sets (-Di)i=i,...,t-i- We note that, as in the previous model, the nested property 
is assumed to allow efficient estimations for the model parameters. The Gaussian processes 
{&t{x))t = 2 ... s have the same definition as previously and we have for t = 2, . . . , s: 

'z t (x)\Z® =z®,p t ,0 /H _ 1 ,o%] ~N{p Zt (x),s 2 zM (10) 

where: 

p Zt (x) = p^^nzt-M + fT( x )Pt + rJ(x)Ri l (z t - p t -i(D t ) - F t f3 t ) (11) 



4 



and: 

4» = P?-i(z)4 t __» + °1 (1 - rf(x)R^r t (x)) (12) 
The notation represents the element by element matrix product. Rt is the correlation 
matrix R t = (rt(x,x')) X:X ' & D t and rf(x) is the correlation vector rf(x) = (r t (x,x')) x ' & Df 
We denote by pt(Dt~\) the vector containing the values of pt{x) for x E Dt—i, zt(Dt-i) the 
vector containing the known values of Zt(x) at points in Dt-i and Ft is the experience matrix 
containing the values of ft(x) T on Dt- 

The mean pz t { x ) ls the surrogate model of the response at level t, 1 < t < s, taking 
into account the known values of the t first levels of responses (£i)i=i an d the variance 
a Zt (x) represents the mean squared error of this model. The mean and the variance of the 
Gaussian process regression at level t being expressed in function of the ones of level t — 1, 
we so have a recursive multi-fidelity metamodel. Furthermore, in this new formulation, it 
is clearly emphasized that the mean of the predictive distribution does not depend on the 
variance parameters (of )t=i s . This is a classical result of kriging model which states that 
for covariance kernels of the form k(x,x') = a 2 r(x,x'), the mean of the kriging model is 
independent of a 2 . 



Remark. The previous comment highlights an important strength of the recursive formula- 
tion. Indeed, contrary to the formulation suggested in |Kennedy and O 'Haga n72000| , once the 
multi-fidelity model is built, it provides the surrogate models of all the responses (zt(x))t=i... jS . 

Furthermore, from this formulation, we can directly deduce that building a s-level co- 
kriging is equivalent to build s independent krigings. This implies a reduction of the model 
complexity. Indeed, the inversion of the matrix V, of size Ya=i n « x Si=i n i * s more expensive 
than the inversions of s matrices (Rt)t=i,...,s of size (nt X nt)t=X,...,s where nt corresponds to 
the size of the vector zt at level t = 1, . . . , s. We also reduce the memory cost since storing 
the matrix V s required more memory than storing the s matrices (Rt)t=l ... s- Then, we note 
that the model with this formulation is more interpretable since we can deduce the impact of 

each level of response into the model error through (o\ t {x))t=i s - Finally we will see that it 

allows us to adapt classical kriging results to the multi-fidelity co-kriging model (e.g. universal 
kriging and fast cross-validation). 

We have the following proposition. 

Proposition 1 Let us consider s Gaussian processes (Zt{x))t=i s and = (Zt)t=\,..., s 

the Gaussian vector containing the values of [Zt {x))t=\,..., s at points in {Dt)t=i,..., s with D s C 
D s _i C ■ ■ ■ C D\. If we consider the mean and the variance ^ and ([5]) induced by the model 
when we condition the Gaussian process Z s (x) by the known values z^ s ' of and the 
mean and the variance and induced by the model \16\) when we condition Z s {x) by 
z( s \ then, we have: 

p Zs (x) = m Zs (x) 
o\X x ) = 4 S 0) 

The proof of the proposition is given in Appendix lA.il It shows that the model of |Kennedy and O'Ha gan, 200C 
and the recursive model (|16p have the same mean and covariance function. Therefore, pre- 
dictive distributions of the two models are identical and the recursive model has the same 
strengths as the one of |Kennedy and O'Hagan, 2000 to which we add the benefits mentioned 
in the previous remark. 
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2.3 Parameter estimations 

We present in this section a Bayesian estimation of the parameter if) = (/3,/3p,<7 2 ) focusing 
on conjugate and non-informative distributions for the priors. This allows us to obtain closed 
form expressions for the estimations of the parameters. Furthermore, from the non-informative 
case, we can obtain the estimates given by a maximum likelihood method. The presented 
formulas can hence be used in a frequentist approach. We note that the recursive formulation 
directly shows us that the estimations of the parameters (^4, P Pt _ 1} 0f)t=i,...,s and (/3i,cr 2 ) can 
be performed separately. 

We use in this section the notation info to design the case where all the priors are infor- 
mative and ninfo to design the case where all the priors are non-informative. It would be 
possible to address the case of a mixture of informative and non-informative priors. For the 
non-informative case, we use the "Jeffreys priors" [Jeffreys, 1961| : 

KAk?) oc 1, p{al)^\, K^-nftl^UtVl, pKV*~ 1} ) cc A ( 13 ) 

where t = 2, . . . , s. For the informative case, we consider the following conjugate prior distri- 
butions: 

b t-i \ J„ _ J ( Vf_ x 



v t 



p 



with b\ a vector a size p\, b p t _ x a vector of size qt-i, a vector of size pt, V\ api X p\ matrix, 

Vf-i a ?t— l x It— 1 matrix, vf a pt x pt matrix and 011,71,0:4,74 > 0. These informative 
priors allow the user to prescribe the means and the variances of all parameters. The choice 
of conjugate priors allows us to have closed form expressions for the parameter estimations. 
Indeed, we have: 

[ftlzi.of] -./^(Ei^Ei) [Ppt-vPtlz®,^] -A/^+^^S*) (14) 
where, for t > 1: 

< lHT ^ Ht + ^ inf „ f + £ M info 

[Hf^-Ht]- 1 ninfo [# t T ±|r^] ninfo 

with iJi = F\ and for t > 1, Ht = [Gt-i © (^-i(Dt)l^_ 1 ) i^J where Gt-i is the experience 
matrix containing the values of gt-\{x) T in Z^. Furthermore, we have for t > 1: 

[a 2 ^] ~ XS(a t , ^) (16) 



where: 

7t + (6* - A t ) T (T4 + [HfR^Ht]- 1 )- 1 ^ - \ t ) + Q t info 

ninfo 
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with Q t = (zt - H t \ t ) T Rt\z t - H t \ t ) , %t = {HjR^HtF^HjR^zt and : 

_ f f + a t info 
at ~\ ^np^l ninfo 

with the convention go — 0. 

We highlight that the maximum likelihood estimates for the parameters (5\ and (f3 pt _ 1 ,(3t) 
are given by the means of the posterior distributions of the Bayesian estimations in the non- 
informative case. Furthermore, the restricted maximum likelihood estimate of the variance 
parameter of can also be deduced from the posterior distribution of the Bayesian estimation 
in the non-informative case and is given by of EML = ^a- The restricted maximum likelihood 
estimation is a method which allows us to reduce the bias of the maximum likelihood estimation 
|Patterson and Thompson, 1971] . 

3 Universal co-kriging model 

We see in equation (fTU|) that the predictive distribution of Z s (x) is conditioned by the observa- 
tions z and the parameters /3, f3 p and a 2 . The objective of a Bayesian prediction is to integrate 
the uncertainty due to the parameter estimations into to the predictive distribution. Indeed, 
in the previous subsection, we have expressed the posterior distributions of the variance pa- 
rameters (of )t=i s conditionally to the observations and the posterior distributions of the 

trend parameters /3i and (f3 Pt _ 1 , /3t)t=2,...,s conditionally to the observations and the variance 
parameters. Thus, using the Bayes formula, we can easily obtain a predictive distribution 
only conditioned by the observations by integrating into it the posterior distributions of the 
parameters. 

Nonetheless, this predictive distribution is clearly not Gaussian and can be expensive to 
obtain. In particular, we cannot have a closed form expression for the predictive distribution. 
Therefore, it is relevant to consider in our analysis only the mean R[Z 3 (x)\zW = z^} and 
the variance Yai(Z s (x)\Z^ = z^). 

The following proposition giving the closed form expressions of the mean and the variance 
of the predictive distribution only conditioned by the observations is a novelty. The proof of 
this proposition is based on the recursive formulation which emphasizes the strength of this 
new approach. 

Proposition 2 Let us consider s Gaussian processes (Zt(x))t=i,..., s an d = {Zt)t=i,... yS 
the Gaussian vector containing the values of (Zt(x))t=i,..., s at points in (Dt)t=i,...,s with D s C 
D s _\ C ■ ■ ■ C D\. If we consider the conditional predictive distribution in equation U0\) and 
the posterior distributions of the parameters given in equations fl 11$ and Iil6\) , then we have 
fort = l,.. .,s: 

E[Z t (x)\Z® = z^} = hf(x)E t u t + rf(x)Rt 1 (z t - H t Y, t v t ) (17) 

with hj = fl, H ± = F 1 and for t > 1, hf(x) = ( gt^xf^Zt-^Zt-x = *t-i] f?{x) ) 
and Ht = [Gt-i (zt-i(Dt)lJ t _ 1 ) Ft]. Furthermore, we have: 

Var(Z f (x)|zW = *(*>) = p2_i(^)Var(Z 4 _ 1 (x)|Z^ 1 ) = z^) + (l - rf(x)R^rf(x)) 

+ (hf - rJ{x)R^H t ) S 4 (hf - rJ{x)R^H t ) T 

(18) 

with pt-i(x) = [^t]l,...,q t -i- 
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The proof of Proposition [2] is given in Appendix IA, 21 We note that, in the mean of the pre- 
dictive distribution, the parameters have been replaced by their posterior means. Furthermore, 
in the variance of the predictive distribution, the variance parameter has been replaced by its 
posterior mean and the term (hf — rf(x)R^ HA St (Kf — rj \x)R^ Htj has been added. It 
represents the uncertainty due to the estimation of the regression parameters (including the 
adjustment coefficient). We call these formulas the universal co-kriging equations due to their 
similarities with the well-known universal kriging equations (they are identical for s = 1). 



4 Fast cross-validation for kriging and co-kriging surrogate mod- 
els 

The idea of a cross-validation procedure is to split the experimental design set into two dis- 
joints sets, one is used for training and the other one is used to monitor the performance 
of the surrogate model. The idea is that, the performance on the test set can be used as 
a proxy for the generalization error. A particular case of this method is the Leave-One-Out 
Cross- Validation (noted LOO-CV) where n test sets are obtained by removing one observation 
at a time. This procedure can be time-consuming for a kriging model but |Dubrule, 1983| , 
|Rasmussen and Williams, 2006 and |Zhang and Wang, 2009| show that there are computa- 



tional shortcuts. We present in this section their adaptation for co-kriging models. Further- 
more, the cross-validation equations proposed in this section extend the previous ones even for 
s = 1 (i.e. the classical kriging model) since they do not suppose that the regression and the 
variance coefficients are known. Therefore, those parameters are re-estimated for each training 
set. We note that the re-estimation of the variance coefficient is a novelty which is important 
since fixing this parameter can lead to big errors for the estimation of the cross-validation 
predictive variance when the number of observations is small or when the number of points in 
the test set is important. 

If we denote by £<, the set of indices of nt es t points in D s constituting the test set Dt es t an d 
£tj 1 < t < s, the corresponding set of indices in Dt - indeed, we have D s C D s _i C ■ ■ ■ C D±, 
therefore Dt es t C Df. The nested experimental design assumption implies that, in the cross- 
validation procedure, if we remove a group of points from D s we can also remove it from Dt, 
1 < t < s. 

The following proposition gives the vectors of the cross-validation predictive errors and 
variances at points in the test set Dtest when we remove them from the t highest levels of 
code. In the proposition, we consider that we are in the non-informative case for the parameter 
estimation (see Section 12. 3p but it can be easily extended to the informative case presented in 
Section 12.31 



Notations: If £ is a set of indices, then -A^j is the sub-matrix of elements £ x £ of A, ar^i 
is the sub-vector of elements £ of a, £i represents the matrix B minus the rows of index £, 
Cr-^-^i is the sub-matrix of C in which we remove the elements of index — £ x — £ and C[-£,f] 
is the sub-matrix of C in which we remove the rows of index £ and keep the columns of index 

e 

Proposition 3 Let us consider s Gaussian processes (Zt(x))t=i t ... tS an d = (Zt)t=i,...,s 
the Gaussian vector containing the values of {Zt{xj)t=\ 8 at points in (Dt)t=l ... s with D s C 
D s _\ C • • • C D\. We note Dtest a se t made with the points of index £ s of D s and ^ the 
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corresponding points in Dt with 1 < t < s. Then, if we note £z s £ a the errors (i.e. real values 
minus predicted values) of the cross-validation procedure when we remove the points of Dt es t 
from the t highest levels of code, we have: 

(ez.,6, - Ps-i{D t est) ez._LS.-i) [K 1 ] = i z s ~ H s \ s -^)\ [£J (19) 

with e z u ,£ u = when u<t, A s _ 6 ] h6] K s [# s ][„ 6] ) = [H T ] hU K s z s (D s \ D test ) and: 

Furthermore, if we note a^ s ^ s the variances of the corresponding cross-validation procedure, 
we have: 

4 s , Cs = ^-i(Aest) ol.-xA-x + <-6 dia § f( W r&Al) ^ + Vs < 21 > 



with: 



2 (z s (D s \ Acst) - [5a][-e.]Aa,_{,) if s \ Aest) - 

Qs—1 Strain 

where a\ _^ = w/ien u < t, nt ra in is the length of the index vector £ SJ H s = [G s -i 
(z.-ip.jlf^) F s ] and: 

V S =U T S {[Hj^KslH^y'Us (23) 
withU s = ((7>\ ; ) ' [Rj'Hs]^ 

We note that these equations are also valid when s = 1, i.e. for kriging model. We hence have 
closed form expressions for the equations of a /c-fold cross-validation with a re-estimation of the 
regression and variance parameters. These expressions can be deduced from the universal co- 
kriging equations. The complexity of this procedure is essentially determined by the inversion 
of the matrices I Ti?" 1 ] rc , ,) of size nt es t x ntest- Furthermore, if we suppose the 

parameters of variance and/or trend as known, we do not have to compute o~f _^ and/or \t,—£ t 

(they are fixed to their estimated value, i.e. of _£ t = 2(a t -i) and "V— 6 = ^t v t> see Section [2~3]) 
which reduces substantially the complexity of the method. These equations generalize those 
of |Dubrule, 1983| and |Zhang and Wang, 2009 where the variance of _£ t is supposed to be 



known. Finally, the term V s corresponds to the added term due to the parameter estimations 
in the universal co-kriging. Therefore, if the trend parameters are supposed known, this term 
is equal to 0. The proof of Proposition [3] is given in Appendix IA.3I 

5 Application: hydro dynamic simulator 

In this section we apply our co-kriging method to the hydrodynamic code "MELTEM". This 
code simulates a second-order turbulence model for gaseous mixtures induced by Richtmyer- 
Meshkov instability |Gregoire et al., 2005 . Two input parameters x\ and x% are considered. 



They are phenomenological coefficients used in the equations of the energy of dissipation 
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of the turbulent flow. These two coefficients vary in the region [0.5,1.5] x [1.5,2.3]. The 
considered code outputs, called eps and L c , are respectively the dissipation factor and the 
mixture characteristic length. The simulator is a finite-elements code which can be run at 
s = 2 levels of accuracy by altering the finite-elements mesh. The simple code Z\{.), using a 
coarse mesh, takes 15 seconds to produce an output whereas the complex code Z2Q, using 
a fine mesh, takes 8 minutes. The aim of the study is to build a prediction as accurate 
as possible using only a few runs of the complex code and to assess the uncertainty of this 
prediction. In particular, we use 5 runs for the complex code Z2(x) and 25 runs for the cheap 
code z\(x). Then, we build an additional set of 175 points to test the accuracy of the models. 
Furthermore, no prior information is available: we are hence in the non-informative case. 

5.1 Estimation of the hyper-parameters 

In the previous sections, we have considered the correlation kernels (rt(x, a/))t=i a as known. 
In practical applications, we choose these kernels in a parameterized family of correlation ker- 
nels. Therefore, we consider kernels such that rt(x, x') = rt(x, x'; (fit)- The hyper-parameter fa 
can be estimated by maximizing the concentrated restricted log-likelihood |Santner et al., 2 003 
with respect to fa: 

log (|det (Rt)\) + (m — Pt — qt-i) log (^ 2 ,eml) (24) 

with the convention qo = and <J^ EML is the restricted likelihood estimate of the variance 
erf (see Section I2.3P . This minimization problem has to be solved numerically. It is a com- 
mon choice to consider the hyper-parameters as known and to estimate them by maximum 
likelihood |Santner et al., 2003| . 

It is also possible to estimate the hyper-parameters (fa)t=l ...,s by minimizing a loss func- 
tion of a Leave-One-Out Cross- Validation procedure. Usually, the complexity of this procedure 
is O ^(X^i=i n <) 4 ) ■ Nonetheless, thanks to Proposition [3l it is reduced to O (X^i=i n i) since it 
is essentially determined by the inversions of the s matrices (R^ )t=l ... s- Therefore, the com- 
plexity for the estimation of (fa)t=\,...,s is substantially reduced. Furthermore, the recursive 
formulation of the problem allows us to estimate the parameters {fa)t=i,...,s one a t a time by 
starting with fa and estimating fa, t = 2, . . . , s, one after the other. 

5.2 Comparison between kriging and multi-fidelity co-kriging 

Before considering the real case study, we propose in this section a comparison between the 
kriging and co-kriging models when the number of runs n 2 for the complex code varies such 
that fi2 = 5, 10, 15, 20, 25. For the co-kriging model, we consider n\ = 25 runs for the cheap 
code. In this section, we focus on the output eps. 

To perform the comparison, we generate randomly 500 experimental design sets {D 2 ^, ^i,i)j= 
such that -D2,i C -Di,«, i = 1, . . . , 500, has n\ points and has n 2 points. 

We use for both kriging and co-kriging models a Matern| covariance kernel and we con- 
sider p, f3\ and f3 2 as constant. The accuracies of the two models are evaluated on the test 
set composed of 175 observations. From them, the Root Mean Squared Error (RMSE) is 

1/2 

computed: RMSE = £ili(/^ 2 (x\ cst ) - z 2 {xf st )) 2 ^ . 

Figure Q] gives the mean and the quantiles of probability 5% and 95% of the RMSE com- 
puted from the 500 sets (D2 %, D\ i)j=i ... 500 when the number of runs for the expensive code 
ni varies. In Figure [H we can see that the errors converge to the same value when ni tends 
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Figure 1: Comparison between kriging and co-kriging with m = 25 runs for the cheap code 
(500 nested design sets have been randomly generated for each r^)- The circles represent the 
averaged RMSE of the co-kriging, the triangles represent the averaged RMSE of the kriging, 
the crosses represent the quantiles of probability 5% and 95% for the co-kriging RMSE and 
the times signs represent the quantiles of probability 5% and 95% of the kriging RMSE. Co- 
kriging predictions are better than the ordinary kriging ones for small ri2 and they converge 
to the same accuracy when ri2 tends to n\ = 25. 



to n\. Indeed, due to the Markov property given in Section [2.1[ when D<i = D\, only the 
observations zi are taken into account. Furthermore, we can see that for small values of ri2, 
it is worth considering the co-kriging model since its accuracy is significantly better than the 
one of the kriging model. 

5.3 Nested space filling design 

As presented in Section [2] we consider nested experimental design sets: Vi = 2, . . . , s D( C 
Dt-\. Therefore, we have to adopt particular design strategies to uniformly spread the inputs 
for all Dt- A strategy based on Orthogonal array-based Latin hypercube for nested space- 
filling designs is proposed by |Qian et al., 2009] . 

We consider here another strategy for space-filling design, described in the following algorithm, 
which is very simple and not time-consuming. The number of points rit for each design Dt is 
prescribed by the user, as well as the experimental design method applied to determine the 
coarsest grid D s used for the most expensive code z s (see |Fang et al., 2 006 for a review of 
different methods). 
ALGORITHM 

build D s = {Xj}j=\,...,n e with the experimental design method prescribed by the user, 
for t = s to 2 do: 
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build design A-i with the experimental design method prescribed by the user, 
for % = 1 to rit do: 

find x * ^ G A-l the closest point from xf^ E A where j G [l,rt£_i]. 

remove x} ^ from A—i . 
end for 

A-i = A-iU A- 

end for 

This strategy allows us to use any space-filling design method and it conserves the initial 
structure of the experimental design A of the most accurate code, contrarily to a strategy 
based on selection of subsets of an experimental design for the less accurate code as presented 
by |Kennedy and Q'Hagan, "20001 and |Forrester et al, 2007| . We hence can ensure that A 
has excellent space-filling properties. Moreover, the experimental design A-i being equal to 
A-i U A; this method ensure the nested property. 

In the presented application, we consider n<i = 5 points for the expensive code z^ix) and 
n\ = 25 points for the cheap one z±(x). We apply the previous algorithm to build A and A 
such that A C A- F° r the experimental design set A?) we use a Latin-Hypercube-Sampling 
|Stein, 1987] optimized with respect to the S-optimality criterion which maximizes the mean 
distance from each design point to all the other points |Stocki, 2005| . Furthermore, the set 
A is built using a maximum entropy design |Shewry and Wynn, 1987] optimized with the 



Fedorov-Mitchell exchange algorithm |Currin et al., 199T . These algorithms are implemented 



in the library R lhs. The obtained nested designs are shown in Figure [2j 
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Figure 2: Nested experimental design sets for the hydrodynamic application. The crosses 
represent the n\ = 25 points of the experimental design set A of the cheap code and the 
circles represent the ri2 = 5 points of the experimental design set A? of the expensive code. 
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5.4 Multi-fidelity surrogate model for the dissipation factor eps 



We build here a co-kriging model for the dissipation factor eps. The obtained model is 
compared to a kriging one. This first example is used to illustrate the efficiency of the co- 
kriging method compared to the kriging. It will also allow us to highlight the difference 
between the simple and the universal co-kriging. 

We use the experimental design sets presented in Section 15.31 To validate and compare 
our models, the 175 simulations of the complex code uniformly spread on [0.5, 1.5] x [1.5,2.3] 
are used. To build the different correlation matrices, we consider a tensorised matern-| kernel 



(see Rasmussen and Williams, 2006 1): 

r(x,x';0 t ) = r ld (xi ) xi;9t t i)rid(x2,x' 2 ;9t,2) 



with x = (x!,x 2 ) G [0.5,1.5] x [1.5,2.3], 9 t ,i,9 tj2 € R 

5 (xi 



and: 



rid(xi,x'i;9 t ,: 



1 + V5 1 - 



J\2 



+ 



HA 



I 2 

7 tA 



exp 



X; 



HA 



(25) 



(26) 



Then, we consider g\{x) = 1, /bOc) = 1, fi(x) = 1 (see Section [2.11 and l2.2p and, using the 
concentrated maximum likelihood (see Section 15. ip , we have the following estimations for the 
correlation hyper-parameters: 6\ = (0.69,1.20) and 9 2 = (0.27,1.37). 

According to the values of the hyper-parameter estimates, the co-kriging model are very 
smooth since the correlation lengths are large compared to the size of the input parameter 
space. Furthermore, the estimated correlation between the two codes is 82.64%, which shows 
that the amount of information contained in the cheap code is substantial. 

Table [1] presents the results of the parameter estimations (see Section I2.3P . 



Trend coefficient 
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8.84 


0.48 
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/ 1.98 -18.13 \ 
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V 0.74 ) 


V -18.13 165.82 J 


Variance coefficient 


Qt 


2a t 




6.98 


24 


4 


0.06 
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Table 1: Application: hydrodynamic simulator. Parameter estimation results for the response 
eps (see equations (|14[) and (fT6|) ). 

We see in Table [1] that the correlation between (3 pi and f3 2 is important which highlights 
the importance of taking into account the correlation between these two coefficients for the 
parameter estimation. We also see that the adjustment parameter f3 pi is close to 1, which 
means that the two codes are highly correlated. 

Figure [3] illustrates the contour plot of the kriging and co-kriging mean, we can see signif- 
icant differences between the two surrogate models. 

Table [2] compares the prediction accuracy of the co-kriging and the kriging models. The 
different coefficients are estimated with the 175 responses of the complex code on the test set: 

• MaxAE: Maximal absolute value of the observed error. 
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Figure 3: Contour plot of the kriging mean (on the left hand side) and the co-kriging mean (on 
the right hand side). The triangles represent the n 2 = 25 points of the experimental design 
set of the expensive code. 



• RMSE : Root mean squared value of the observed error. 

• <?2 = 1 - ||MZ 2 (AeBt) " 2 2 (Acst)||7ll^ 2 (Ac S t) - z 2 \\ 2 , with Z 2 = (J^ill ^{xf St ))/n 2 . 

• RIMSE : Root of the average value of the kriging or co-kriging variance. 



Q 2 RMSE MaxAE RIMSE 
kriging 75.83% 0.133 0.49 0.110 
co-kriging 98.01% 0.038 0.14 0.046 



Table 2: Application: hydrodynamic simulator. Comparison between kriging and co-kriging. 
The co-kriging model provides predictions significantly better than the ones of the kriging 
model. 

We can see that the difference of accuracy between the two models is important. Indeed, the 
one of the co-kriging model is significantly better. Furthermore, comparing the RMSE and 
the RIMSE estimations in Table [2j we see that we have a good estimation of the predictive 
distribution variances for the two models. We note that the predictive variance for the co- 
kriging is obtained with a simple co-kriging model. Therefore, it will be slightly larger in the 
universal co-kriging case. Indeed, by computing the universal co-kriging equations, we find 
RIMSE = 0.058. 

We can compare the RMSE obtained with the test set with the RMSE obtained with 
a Leave-One-Out cross validation procedure (see Section 2]). For this procedure, we test 
our model on n 2 = 5 validation sets obtained by removing one observation at a time. As 
presented in Section [U we can either choose to remove the observations from z 2 or from z 2 
and Z\. The root mean squared error of the Leave-One-Out cross validation procedure obtained 
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by removing observations from Z2 is RMSE 22j £oo = 4.80. 10~ 3 whereas the one obtained by 
removing observations from Z2 and z\ is RMSE 2l iZ2j £oo = 0.10. Comparing RMSE 22) £0O and 
RMSE Zl t z2,LOO to the RMSE obtained with the external test set, we see that the procedure 
which consists in removing points from Z2 and Z\ provides a better proxy for the generalization 
error. Indeed, RMSE Z2) ^oo is a relevant proxy for the generalization error only at points where 
z\ is available. Therefore, it underestimates the error at locations where z\{x) is unknown. 

Figure H] represents the mean and confidence intervals at plus or minus twice the standard 
deviation of the simple and universal co-krigings for points along the vertical line X\ = 0.99 
and the horizontal line X2 = 1.91 (x = (0.99, 1.91) corresponds to the coordinates of the point 
of L>2 in the center of the domain [0.5, 1.5] x [1.5,2.3]). 




Figure 4: Mean and confidence intervals for the simple and the universal co-kriging. The figure 
on the left hand side represents the predictions along the vertical line x\ = 0.99 and the figure 
on the right hand side represents the predictions along the horizontal line X2 = 1.91. The 
solid black lines represent the mean of the two co-kriging models, the dashed lines represent 
the confidence interval at plus or minus twice the standard deviation of the simple co-kriging 
and the dotted lines represent the same confidence intervals for the universal co-kriging. 



In Figure [J] on the right hand side, we see a necked point around the coordinates x\ = 1.5 
since, in the direction of X2, the hyper-parameters of correlation for Z\{x) and 62 (x) are large 
(di,2 = 1-20 and #2,2 = 1-37) and a point of D2 have almost the same coordinate. 

5.5 Multi-fidelity surrogate model for the mixture characteristic length L c 

In this section, we build a co-kriging model for the mixture characteristic length L c . The 
aim of this example is to highlight that it could be worth having an adjustment coefficient p\ 
depending on x. We use the same training and test sets as in the previous section and we 
consider a tensorised matern-| kernel (|25p . Let us consider the two following cases: 

• Case 1: g\(x) = 1, f2(%) = 1 and /i(x) = 1 

• Case 2: gf (x) = ( 1 x\ ), $2{x) = 1 and fi(x) = 1 
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We have the following hyper-parameter maximum likelihood estimates for the two cases 

• Case 1: §i = (0.52, 1.09) and <9 2 = (0.03,0.02) 

• Case 2: §i = (0.52, 1.09) and 9 2 = (0.14, 1.37) 

The estimation of Q\ is identical in the two cases since it does not depend on p\ and it is 
estimated with the same observations. Furthermore, we see an important difference between 
the estimates of (9 2 . Indeed, they are larger in the Case 2 than in the Case 1 which suppose 
that the model is smoother in the Case 2. Table [3] presents the estimations of Pi and a\ for 
the two cases (see Section |2.3|) . 



Trend coefficient 


Eii/i 




Pi 


1.26 


0.97 


Variance coefficient 


Qi 


2ai 


4 


15.62 


24 



Table 3: Application: hydrodynamic simulator. Estimations of Pi and o\ for the response L c 
(see equations (|14|> and (|16[) ). 



Then, Table 2] presents the estimations of 02, Pp 1 an d a\ for the Case 1, i.e. when p\ is 
constant (see Section f2.3p . 



Trend coefficient 




S 2 /(7 2 2 
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\ -0.26 J 


^ -0.79 0.95 J 


Variance coefficient 


Q2 


2a 2 




0.01 


3 



Table 4: Application: hydrodynamic simulator. Estimations of /3 2 , (3 pi and a\ for the Case 1, 
i.e. when p\ is constant, for the response L c (see equations (|14p and (|16p ). 

Finally, Table [5] presents the estimations of /3 2 , f3 pi and cr| for the Case 2, i.e. when pi 
depends on x (see Section 12. 3p . 



Trend coefficient 



P P1 
Pi 



Variance coefficient 



(To 



S 2 z^ 2 




Q-2 



3.24.10~ 4 



2.34 -3.50 0.44 
-3.50 9.18 -3.67 
0.44 -3.67 2.60 



2a 2 



Table 5: Application: hydrodynamic simulator. Estimations of /3 2 , P pi and cr 2 for the Case 2, 
i.e. when p\ depends on x, for the response L c (see equations (I14p and ()16[) ) . 
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We see in Table 0] that the adjustment coefficient is around 1.5 which indicates that the 
magnitude of the expensive code is slightly more important than the one of the cheap code. 
Furthermore, we see in Table [5] that if we consider an adjustment coefficient which linearly 
depends on x\ (i.e. with gf(x) = ( 1 xi )), the constant part of /3 pi is more important (it 
is around 1.66) and there is a negative slop in the direction x\ (it is around —0.48). Since 
x G [0.5,1.5], the averaged value of p\ is 1.18 and goes from 1.42 at x\ = 0.5 to 0.94 at 
x\ = 1.5. We see also a significant difference between the two case for the variance estimation. 
Indeed, the variance estimate in the Case 1 (see Table 3|) is much more important than the 
one in the Case 2 (see Table [5]) . This could mean that we learn better in the Case 2 than in 
the Case 1. 

Figure [5] illustrates the contour plot of the two co-kriging models, i.e. when p\ is constant 
and when p depends on x. 




Figure 5: Contour plot of the co-kriging mean when p\ is constant (on the left hand side) and 
when p\ is depends on x (of the right hand side). The triangles represent the n2 = 5 points 
of the experimental design set of the expensive code. 



Furthermore, Table [6] compares the prediction accuracy of the co-kriging in the two cases. 
The precision is computed on the test set of 175 observations. 

RMSE MaxAE 
Case 1 7.26.10" 3 0.23 
case 2 1.53.10~ 3 0.16 



Table 6: Application: hydrodynamic simulator. Comparison between co-kriging when p\ 
is constant (Case 1) and co-kriging when p\ depends on x (Case 2). The Case 2 provides 
predictions better than the Case 1, it is hence worthwhile to consider an adjustment coefficient 
not constant. 

We see that the co-kriging model in Case 2 is clearly better than the one in Case 1. 
Therefore, we illustrate in this application that it can be worth considering an adjustment 
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coefficient not constant contrary to the model presented in |Kennedy and O'Hagan, 2000| and 
|Forrester et al., 2007 . 



6 Conclusion 

We have presented in this paper a recursive formulation for a multi-fidelity co-kriging model. 
This model allows us to build surrogate models using data from experiments of different levels 
of fidelity. 

The strength of the suggested approach is that it considerably reduces the complexity 
of the co-kriging model while it preserves its predictive efficiency. Therefore, the proposed 
method is competitive regarding the Bayes linear approach in which the principal strength is 
a low computational cost but with a low predictive efficiency. Furthermore, one of the most 
important consequences of the recursive formulation is that the construction of the surrogate 
model is equivalent to build s independent krigings. Consequently, we can naturally adapt 
results of kriging to the proposed co-kriging model. 

In this paper, we first prove that our model is equivalent to another very popular model 
in terms of predictive distributions whereas it reduces its complexity. Then, we present a 
Bayesian estimation of the model parameters which provides closed form expressions for the 
parameters of the posterior distributions. We note that, from these posterior distributions, 
we can deduce the maximum likelihood estimates of the parameters. Then, thanks to the 
joint distributions of the parameters and the recursive formulation, we can deduce closed form 
formulas for the mean and covariance of the posterior predictive distribution. Due to their 
similarities with the universal kriging equations, we call these formulas the universal co-kriging 
equations. Finally, we present closed form expressions for the cross-validation equations of 
the co-kriging surrogate model. These expressions reduce considerably the complexity of 
the cross-validation procedure and are derived from the one of kriging model that we have 
extended. 

The suggested model has been successfully applied to a hydrodynamic code. We also 
present in this application a practical way to design the experiments of the multi-fidelity 
model and we show that it is worth using this co-kriging model instead of a kriging model. 

7 Acknowledgements 

The author particularly thanks Professor Josselin Gamier for supervising his work and for his 
fruitful guidance. He is also grateful to Dr. Claire Cannamela for providing the data for the 
application and for her interesting discussions and helpful comments. 

A Proofs 

A.l Proof of Proposition [I] 

Let us consider the co-kriging mean of the model ([TJ presented in |Kennedy and O'Hagan, 2000 
for a t- level co-kriging with t = 2, . . . , s: 

m Zt (x) = h! t {xf '/3® +t t (x) T V t - 1 (zV - Ht p®) 
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where /3^> = ((3f , . . . ,/3f) T , z^> = (zj , . . . ,zJ) T and h' t (x) T is defined in equation ©. We 
have: 

Then, from equations ([7]) and ([S]), we have the following equality: 

UixfVf'z^ = Pt ^{x)t t ^{x) T V t z\z^ 1] ~ {pI-i(D t )) {rf(x)R^z t ^(D t )) 
+rf(x)Rt 1 z t 

and with equation ©: 

t t (x) T V t ~ l H t p^ = p t ^(x)t t ^(x) T V t z\H t ^/3^ +rJ(x)Rt 1 F t (D t )/3 t 

where stands for the element by element matrix product. We hence obtain the recursive 
relation: 

m Zt {x) = p i „ 1 (x)m Zt _ 1 (x) + fj{x)p t + rf(x)R^ 1 [z t - p t -i(D t ) z t -x{D t ) - F t {D t )f3 t ] 

The co-kriging mean of the model (J9j> satisfies the same recursive relation ([6]), and we have 
mzi( x ) — f J 'Z 1 (x)- This proves the first equality of Proposition [TJ 

p, Ze (x) = m z s (x) 

We follow the same guideline for the co-kriging covariance: 

s 2 Zt (x,x') = v 2 Zt (x,x') - t?(x)V t -H t (x>) 

where v Zt (x,x') is the covariance between Zt(x) and Zt(x') and s Zt (x,x') is the covariance 
function of the conditioned Gaussian process [Z t {x)\Z® = zM,/3,/3 p ,ct 2 ] for the model ©. 
From equation ([8]), we can deduce the following equality: 

a z t ( x , x ') = Pt-i{ x )pt-i(x')v Ztl (x, x) + v 2 (x,x') 

where a Zt (x,x') is the covariance function of the conditioned Gaussian process [Zt(x)\Z^ = 
z^\ f3t, (3 pt _ 1 ,a 2 ] of the recursive model Then, from equation (|7|) and ([5]), we have: 

tf(x)V t -h t (x') = fH - 1 (x)p t - 1 (x')ll_ 1 (x)V t Z\tt-l&) + a 2 rf(x)R7 l rt(x') 
Finally we can deduce the following equality: 

s 2 Zt (x,x') = p t ^(x)p t ^(x') (v%_ 1 (x,x?)-ll_ 1 (x)V£.\tt-i{a/j) + c7 2 {l-rf(x)R^r t (x')) 

which is equivalent to: 

s 2 Zt (x,x') = /3i _ 1 (x) / ot_ 1 (x , )4 t _ 1 (^^') + (1 - rJ{x)R^ 1 r t {x')) 

This is the same recursive relation as the one satisfies by the co-kriging covariance a Zt (x,x') 
of the model ([9]) (see equation (fl2j) ). Since s Zi (x,x') = a Zi (x,x'), we have : 

°% s { x i x ') = s z s ( x , x ') 

This equality with x = x' proves the second equality of Proposition [TJ □ 
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A. 2 Proof of Proposition [2] 

Noting that the mean of the predictive distribution in equation (|10p do not depend on of and 
thanks to the law of total expectation, we have the following equality: 



E 



Z t {x)\Z^ =zW 



E 



E 



Z t {x)\zV=z«\olp t ^ pt _ 1 ] Z<*W*> 



From the equations (jlip and ()14j) . we directly deduce the equation (|17p . Then, we have the 
following equality: 



var L Zt (x) z®,(%) = (hf(x) - r t (xf R^ 1 H t )Z t (h? (x) - r t (x) T RJ 1 H t ) T 



(27) 



The law of total variance states that: 

var(Z t {x)\z {t \a?) = E 
+ var 



var^^lAA,/^,^ 



(E [z t (s)|A 



A*? 



Thus, from equations (jlip . (|17p and (|27p . we obtain: 



var 



(Z t (x)\zM = z^,a1) = pf(x)var(Z t _ 1 (x)|2:(*- 1 ) = z^- 1 ), of) + of (l - rJ{x)Rj 1 rl{x)) 



(28) 



Again using the law of total variance and the independence between E \Zt{x)\Z^ = , /3t,P pt _ 1 



and of, we have: 



var 



(Z t (x)\z®)=E a , [var(Z t (x))|zW, 



(29) 



We obtain the equation (|18p from equation (fT6|) by noting that the mean of an inverse Gamma 
distribution TQ(a, b) is bj (a — 1) □ 

A. 3 Proof of Proposition [3] 

Let us consider that £ s is the index of the k last points of D s . We denote by Dtest these points. 
First we consider the variance and the trend parameters as fixed, i.e. a t = 2(a t -i) an< ^ 
\t _£ ( = T^ut, and V s = 0, i.e. we are in the simple co-kriging case. Thanks to the block-wise 
inversion formula, we have the following equality: 



R„ 



A B 



(30) 



with A - [R s l ] Ksj _ 6] + [R s l ] { _^_ is] [R s x ] { _ isM Q 1 [R, x ] Ksj _ 6] [R s x ] Ksi _ 6]I 

B = - IR7 1 ] r , , , [iC 1 ! r , , , Q" 1 and: 



[ fis ,-e>] ^ lj 

2= [^V,6] - [^] [6,-6] 



[Rs 1 ] 



-6,-6 



-6,6 



(31) 



We note that = ([Rj 1 



2(o t -l) 



I [6,6 



represents the covariance matrix of the 



points in -Dtcst with respect to the covariance kernel of a Gaussian process of kernel 2(a-i) rs ( x i x ') 



20 



(which is the one of S s (x)) conditioned by the points D s \ Aest- Therefore, from the previous 
remark and the equation (|12|) . we can deduce the equation (|2ip . 
Furthermore, we have the following equality: 

K 1 ] [RJ 1 (z s ~ H s \ s KJ = z s (D te8t ) - hj (D tcst )T, s i^ s 

- fc'U^h.t.y (32) 

x (z,_i(£>, \ Ao,t) - [flf][_ e .|E,K.) 

From this equation and equation (fTT|) . we can directly deduce the equation ()19p with £z Si £ s = 
^s(Aest) - MZ s (Aest)- 

Then, we suppose the trend and the variance parameters as unknown and we have to 
re-estimate them when we remove the observations. Thanks to the parameter estimations 
presented in Section 12.31 we can deduce that the estimates of and \t,-£ t wnen we 

remove observations of index £t are given by the following equations: 

A s ,_ 6 ([HJU 3 K S [H S U 3 ) = [HJU s K s z s (D test ) (33) 

and: 

2 (Zs(Aest) ~ [#s]~6 A s -jsf K s MAest) ~ [H a ]-£ a \ s ,-g s ) /o,A 

°s,-6 ~~ Z Z Z \ ' 

l^s Ps Qs—1 Strain 

with K s = {[R s \_ is> _^ . 

From the equality (|30p . we can deduce that K s = A- BQB T fr om which we obtain the 
equation (I20p . Finally, to obtain the cross-validation equations for the universal co-kriging, 
we just have to estimate the following quantity (see equation (jTHJ) ) : 



h T s {D tcst ) T - [R- 1 ] h6)W K S [H S U S ) S s (^(Aest) T - [RJ 1 ] hM K*[H.U.) (35) 
with S s = (^[Hj]_^ s K s [H s ]_^ a s j 1 . The following equality: 

(/^(Aestf - [r; 1 ] [HsU k s [h s u s ) = (([R^k^y 1 [RJ'Hs] (36) 

allows us to obtain the equation (|23j) and completes the proof. □ 
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